Goto

Collaborating Authors

 task generator


Aligning Validation with Deployment: Target-Weighted Cross-Validation for Spatial Prediction

Brenning, Alexander, Suesse, Thomas

arXiv.org Machine Learning

Cross-validation (CV) is commonly used to estimate predictive risk when independent test data are unavailable. Its validity depends on the assumption that validation tasks are sampled from the same distribution as prediction tasks encountered during deployment. In spatial prediction and other settings with structured data, this assumption is frequently violated, leading to biased estimates of deployment risk. We propose Target-Weighted CV (TWCV), an estimator of deployment risk that accounts for discrepancies between validation and deployment task distributions, thus accounting for (1) covariate shift and (2) task-difficulty shift. We characterize prediction tasks by descriptors such as covariates and spatial configuration. TWCV assigns weights to validation losses such that the weighted empirical distribution of validation tasks matches the corresponding distribution over a target domain. The weights are obtained via calibration weighting, yielding an importance-weighted estimator that targets deployment risk. Since TWCV requires adequate coverage of the deployment distribution's support, we combine it with spatially buffered resampling that diversifies the task difficulty distribution. In a simulation study, conventional as well as spatial estimators exhibit substantial bias depending on sampling, whereas buffered TWCV remains approximately unbiased across scenarios. A case study in environmental pollution mapping further confirms that discrepancies between validation and deployment task distributions can affect performance assessment, and that buffered TWCV better reflects the prediction task over the target domain. These results establish task distribution mismatch as a primary source of CV bias in spatial prediction and show that calibration weighting combined with a suitable validation task generator provides a viable approach to estimating predictive risk under dataset shift.





Towards Efficient Neurally-Guided Program Induction for ARC-AGI

Ouellette, Simon

arXiv.org Artificial Intelligence

ARC-AGI is an open-world problem domain in which the ability to generalize out-of-distribution is a crucial quality. Under the program induction paradigm, we present a series of experiments that reveal the efficiency and generalization characteristics of various neurally-guided program induction approaches. The three paradigms we consider are Learning the grid space, Learning the program space, and Learning the transform space. We implement and experiment thoroughly on the first two, and retain the second one for ARC-AGI submission. After identifying the strengths and weaknesses of both of these approaches, we suggest the third as a potential solution, and run preliminary experiments.


Task Me Anything

Zhang, Jieyu, Huang, Weikai, Ma, Zixian, Michel, Oscar, He, Dong, Gupta, Tanmay, Ma, Wei-Chiu, Farhadi, Ali, Kembhavi, Aniruddha, Krishna, Ranjay

arXiv.org Artificial Intelligence

Benchmarks for large multimodal language models (MLMs) now serve to simultaneously assess the general capabilities of models instead of evaluating for a specific capability. As a result, when a developer wants to identify which models to use for their application, they are overwhelmed by the number of benchmarks and remain uncertain about which benchmark's results are most reflective of their specific use case. This paper introduces Task-Me-Anything, a benchmark generation engine which produces a benchmark tailored to a user's needs. Task-Me-Anything maintains an extendable taxonomy of visual assets and can programmatically generate a vast number of task instances. Additionally, it algorithmically addresses user queries regarding MLM performance efficiently within a computational budget. It contains 113K images, 10K videos, 2K 3D object assets, over 365 object categories, 655 attributes, and 335 relationships. It can generate 750M image/video question-answering pairs, which focus on evaluating MLM perceptual capabilities. Task-Me-Anything reveals critical insights: open-source MLMs excel in object and attribute recognition but lack spatial and temporal understanding; each model exhibits unique strengths and weaknesses; larger models generally perform better, though exceptions exist; and GPT4o demonstrates challenges in recognizing rotating/moving objects and distinguishing colors.


Attention-Driven Multi-Agent Reinforcement Learning: Enhancing Decisions with Expertise-Informed Tasks

Kuroswiski, Andre R, Wu, Annie S, Passaro, Angelo

arXiv.org Artificial Intelligence

In this paper, we introduce an alternative approach to enhancing Multi-Agent Reinforcement Learning (MARL) through the integration of domain knowledge and attention-based policy mechanisms. Our methodology focuses on the incorporation of domain-specific expertise into the learning process, which simplifies the development of collaborative behaviors. This approach aims to reduce the complexity and learning overhead typically associated with MARL by enabling agents to concentrate on essential aspects of complex tasks, thus optimizing the learning curve. The utilization of attention mechanisms plays a key role in our model. It allows for the effective processing of dynamic context data and nuanced agent interactions, leading to more refined decision-making. Applied in standard MARL scenarios, such as the Stanford Intelligent Systems Laboratory (SISL) Pursuit and Multi-Particle Environments (MPE) Simple Spread, our method has been shown to improve both learning efficiency and the effectiveness of collaborative behaviors. The results indicate that our attention-based approach can be a viable approach for improving the efficiency of MARL training process, integrating domain-specific knowledge at the action level.


CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning

Ahmed, Ossama, Träuble, Frederik, Goyal, Anirudh, Neitz, Alexander, Wüthrich, Manuel, Bengio, Yoshua, Schölkopf, Bernhard, Bauer, Stefan

arXiv.org Machine Learning

Despite recent successes of reinforcement learning (RL), it remains a challenge for agents to transfer learned skills to related environments. To facilitate research addressing this problem, we propose CausalWorld, a benchmark for causal structure and transfer learning in a robotic manipulation environment. The environment is a simulation of an open-source robotic platform, hence offering the possibility of sim-to-real transfer. Tasks consist of constructing 3D shapes from a given set of blocks - inspired by how children learn to build complex structures. The key strength of CausalWorld is that it provides a combinatorial family of such tasks with common causal structure and underlying factors (including, e.g., robot and object masses, colors, sizes). The user (or the agent) may intervene on all causal variables, which allows for fine-grained control over how similar different tasks (or task distributions) are. One can thus easily define training and evaluation distributions of a desired difficulty level, targeting a specific form of generalization (e.g., only changes in appearance or object mass). Further, this common parametrization facilitates defining curricula by interpolating between an initial and a target task. While users may define their own task distributions, we present eight meaningful distributions as concrete benchmarks, ranging from simple to very challenging, all of which require long-horizon planning as well as precise low-level motor control. Finally, we provide baseline results for a subset of these tasks on distinct training curricula and corresponding evaluation protocols, verifying the feasibility of the tasks in this benchmark.


Adaptive Procedural Task Generation for Hard-Exploration Problems

Fang, Kuan, Zhu, Yuke, Savarese, Silvio, Fei-Fei, Li

arXiv.org Machine Learning

We introduce Adaptive Procedural Task Generation (APT-Gen), an approach to progressively generate a sequence of tasks as curricula to facilitate reinforcement learning in hard-exploration problems. At the heart of our approach, a task generator learns to create tasks from a parameterized task space via a black-box procedural generation module. To enable curriculum learning in the absence of a direct indicator of learning progress, we propose to train the task generator by balancing the agent's performance in the generated tasks and the similarity to the target tasks. Through adversarial training, the task similarity is adaptively estimated by a task discriminator defined on the agent's experiences, allowing the generated tasks to approximate target tasks of unknown parameterization or outside of the predefined task space. Our experiments on grid world and robotic manipulation task domains show that APT-Gen achieves substantially better performance than various existing baselines by generating suitable tasks of rich variations.